DATA2002 • Semester 2 • Manav Khemchandani

The Price of a Dream

Exploring what drives the cost of a dream house in the city of dreams.

City skyline

Introduction

🏠 Property Size & Land Characteristics

Focus: Do bigger houses and lots actually mean higher prices?

  • Lot.Size — land parcel size
  • Land.Value — value of land alone
  • Living.Area — square footage of living space
  • Bedrooms — number of bedrooms
✨ Quality, Features & Amenities

Focus: Does having more amenities or newer construction raise house prices?

  • Bathrooms — number of bathrooms
  • Rooms — total number of rooms
  • Fireplaces
  • New.Construct — 1 = new construction, 0 = not new
🌇 Location & Environmental Factors

Focus: Do neighbourhood and environmental factors matter?

  • Waterfront — on waterfront or not
  • Pct.College — % adults with college education (proxy for affluence)
  • Sewer.Type — Public, Private, or None/Unknown
🔧 Age & Structural Characteristics

Focus: Do older homes lose value, and does heating/cooling system influence price?

  • Age — age of the house
  • Central.Air — Yes/No
  • Fuel.Type — Gas, Oil, Electric
  • Heat.Type — Hot Air, Hot Water, Electric, etc.

1 Acreage vs. Apartments Showdown: Lots, Land, and the Cost Battle

1.1 Lot Size & Living Area

:::::

1.2 Bedrooms & Land Value

:::

1.3 In conclusion,

  • Living Area and Bedrooms are moderately correlated (≈ 0.66), suggesting overlap.
  • Land Value and Living Area are related (≈ 0.42) but both add explanatory power.
  • Overall, Living Area and Land Value dominate housing price variation, while Lot Size contributes little once these are accounted for.

2.1 Amenities, Features & Qualities

The Perks Premium: Do Amenities Add Value?

Box plot on amenities

Linear Regression results:

  1. \(Price = 42113.84 + 89371.81Bathroom\)

  2. \(Price = 171820.23 + 66743.97Fireplace\)

2.2 Rooms and New Construction

Box plot on Number of Rooms and Newly Constructed or not

Linear Regression results:

  1. \(Price = 52624.99 + 22616.82Room\)

  2. \(Price = 208580.78 + 73726.04NewConstruct\)

2.3 Multivariable regression

Multivariable linear regression: How does the price differ when all variables are combined?

\(Predicted Price = -240.6 + 59749.4Bathrooms + 19561.7Fireplaces + 12322.5Rooms + 663.1 NewlyConstruct\)

Log-linear regression: Are there any outliers?

\(Multivariable R^2 = 0.4341173\)

\(LogLinear R^2 = 0.4192378\)

3.1 Environmental & Neighbourhood Factors

Beyond the Walls: Location as Value

Key variables

  • Waterfront (binary: Yes/No)
  • Pct.College (% college-educated residents)
  • Sewer.Type (Public / Private / None-Unknown)

3.2 Price Relationships

Waterfront vs Price

Neighborhood Education Quality vs Price

3.3 In conclusion,

  • T-tests show a significant waterfront premium

  • Regression confirms Pct.College is a strong positive predictor of price

  • ANOVA shows Public sewer systems linked to higher prices

  • Combined regression: all three remain independent and significant

  • Bottom Summary: Neighbourhood quality and infrastructure are systematically capitalised into housing prices.

Combined Effects on Housing Price

4.1 Age & Structural Characteristics vs. Price

Aging Gracefully? The Price of Time

4.2 Fuel Type & Heat Type

4.3 In conclusion,

5.1 Cross Group Analysis

Figure 1: Correlation Matrix of each pair of variables

5.2 Investigation Assumptions

Figure 2: Assumptions plots before and after log transformation

5.3 Findings

Log-Adjusted Model: R-squared: ~\(0.49\), Adjusted R-Squared: ~\(0.49\)

\(\text{Predicted Price}=11.2+4.01\times 10^{-04}\cdot \text{Living.Area}+0.14\cdot \text{Bathrooms}\)

Simple Comparison Model: R-squared: ~\(0.47\), Adjusted R-Squared: ~\(0.47\)

\(\text{Predicted Price}=11.28+5.06\times 10^{-04}\cdot \text{Living.Area}\)

Interaction Model: R-squared: ~\(0.49\), Adjusted R-Squared: ~\(0.49\)

\(\text{Predicted Price}=11.08+4.77\times 10^{-04}\cdot \text{Living.Area}+0.2\cdot \text{Bathrooms}-3.40\times 10^{-05}\cdot \text{Living.Area:Bathrooms}\)

Figure 3: Interactions Summary

Log-Adjusted VIF:
Living.Area   Bathrooms 
   2.067436    2.067436 

Conclusion

\(R^2=0.5905676\)

Top 5 Biggest Factors

  • Living Area / 1000\(ft^2\)
  • Land Value / 10000$
  • Bathrooms
  • Waterfront
  • Lot Size

\(Predicted Price = 1346.1 + 69205.5 Living.Area + 9133.5 Land Value + 27682.6 Bathrooms + 123373.7 Waterfront + 7369.6 Lot.Size\)

\(MultiVarR^2=0.6354036\)

\(LogLinearR^2=0.5627405\)

DATA2002 • Semester 2 • GROUP L21G03

The City of Dreams

Our analysis confirms: The only way to afford a 2-bedroom apartment in Manhattan is if you are a fictional character from a 90s sitcom or the result of a statistically improbable outlier.

GROUP L21G03

City skyline